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Abstract. In this paper we describe a functorial data migration [6] sce¬ 
nario about the manufacturing service capability of a distributed supply 
chain. The scenario is a category-theoretic analog of an ontology-based 
“semantic enrichment” scenario 5^ developed at the National Institute 
of Standards and Technology (NIST). The scenario is presented using, 
and is included with, the open-source FQL tool, available for download 
at categoricaldata.net/fql.html. 


1 Introduction to Functorial Data Migration 

In the functorial data model , which originated with Rosebrugh and others in the 
late 1990s |3 ], a database schema is a finitely presented category jl] (essentially, a 
directed multi-graph and path equality constraints) and a database instance on a 
schema S' is a set-valued functor from S (essentially, a set of tables). The database 
instances on a schema S constitute a category, denoted S- Inst, and a functor 
F: S -T T between schemas S and T induces three adjoint data migration 
functors: Ap\ T-Inst -A S- Inst, defined as Ap(I) := I o F, and the left and 
right adjoints to Ap, respectively: Up: S-Inst -A T-Inst and lip: S-Inst -A 
T-Inst. These data migration functors provide a category-theoretic alternative 
to traditional, set-theoretic operations for information integration such as SQL 
and the chase [2]. 

We have developed a simple algebraic query language for the functorial data 
model, FQL (for Functorial Query Language), as well as a corresponding in¬ 
tegrated development environment (IDE), the FQL IDE. The FQL IDE is an 
FQL code editor, a FQL AA SQL translator, a FQL execution engine, and a data 
visualization tool designed in the spirit of the schema-mapping tool Clio [4]. 
The FQL IDE is open source, written in java, and available for download at 
categoricaldata.net/fql.html. In this paper, we demonstrate how the FQL IDE is 
used in practice by describing an example data migration scenario developed in 
collaboration with the National Institute of Standards and Technology (NIST). 


** Work supported by ONR grant N000141310260, AFOSR grant FA9550-14-1-0031, 
and NASA grant NNH13ZEA001N-SSAT. 
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Remark. Rosebrugh et al’s original model [3; has a number of theoretical 
issues that prevent it from being used directly as a basis for information inte¬ 
gration. First, Rosebrugh’s model cannot store meaningful data such as strings 
and integers; it can only store meaningless identifiers (IDs). Second, Rosebrugh’s 
model cannot interoperate with SQL. Hence, FQL is actually based on an exten¬ 
sion of Rosebrugh’s model, described in [7J. The exact definition of this extension 
does not matter for the purposes of this paper. 


2 An Enrichment Scenario 

The example described in this paper is an FQL analog of a “semantic enrich¬ 
ment” scenario developed at NIST and published as [5]. In this scenario, a 
database (called Portal A in [5]) contains information about equipment, includ¬ 
ing the capabilities of such equipment; for example, that a particular machine m 
can drill holes as small as .5cm in metal. The goal of the scenario is to “enrich” 
Portal A’s data with additional 3rd party information about materials, so that, 
for example, Portal A’s data also contains the fact that m can drill holes in iron, 
because iron is a kind of metal. 

In [5], Portal A’s database is a Microsoft Access database, the 3rd party 
enriching information about materials is an OWL (Web Ontology Language) 
ontology, and enrichment is done by invoking a black-box OWL reasoner on an 
input query, Portal A’s data, the OWL ontology about materials, and an OWL 
ontology relating portal A’s vocabulary (e.g., “iron”) and the material ontology 
vocabulary (e.g., “ferrous”). In this paper, we simplify this scenario as follows: 
we assume Portal A’s data is given as a SQL database, that the ontology about 
materials is simply an “is-a” parenthood function, and that the correspondence 
between Portal A’s vocabulary and the is-a hierarchy vocabulary is a “synonyms” 
relation between sets of words. 

Our FQL development consists of three main steps: 

1. First, we import Portal A’s data, the is-a hierarchy, and the synonyms into 
FQL. (Section 2.1) 

2. Second, we transitively close the is-a hierarchy, join it with the synonymns 
relation, and then join the result to Portal A’s data. (Section 2.2) 

3. Finally, we test the result of our enrichment on a particular query (query 
1 from 0). This query gives additional results on the enriched data, which 
demonstrates that FQL can be used to do semantic enrichment along the 
lines described in [5]. (Section 2.3) 

Although we only have space to sketch the outline of the development, the entire 
development - about 2000 lines of FQL code, 1800 lines of which are schema 
and data definitions - is included as a built-in example in the FQL IDEj^] 

3 There are three variants of the FQL IDE, each of which implements a slightly dif¬ 
ferent language. The example in this paper is “P NIST Full” in the “FPQL IDE”. 
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2.1 Step 1: Import relational data 

The schema (Figure [l]) for Portal A’s data (Figure [2| is a SQL schema in cate¬ 
gorical normal form [6]: every table consists of a distinguished (primary key) ID 
column, a set of “attribute” columns whose values contain strings or integers, 
and a set of foreign key columns whose values contain IDs that refer to other 
tables. Consequently, Portal A’s schema can be regarded as the presentation of 
a category: the objects of the category are the table names and type names, 
and the arrows between objects are the foreign key or attribute columns in the 
schema. An instance on Portal A’s schema, which physically is a set of relations, 
can then be regarded as a set-valued functor. 

The actual Portal A schema as visualized in Microsoft Access is shown in 
Figure |TJ and a snippet of the SQL commands defining the Portal A data are 
shown in Figure [2] The FQL IDE imports these SQL commands and emits 
corresponding FQL code that defines an FQL schema and an FQL instance on 
that schema. A portion of Portal A’s data, as displayed in the FQL IDE, is 
shown in Figure [3] 



Fig. 1 . Schema for Portal A 


CREATE TABLE unitcode ( 

id INT PRIMARY KEY, Code VARCHAR(255), Description VARCHAR(255) 

); 

INSERT INTO unitcode VALUES 

(1,"EA","Each part/piece count"), 

(2,"Thousands","1000 parts/pieces count"), 

(3,"Inch","Length measure in inches"), 

(4,"mm","Length measure in millimeters"), 

(5,"cm","Length measure in centimeters"); 


Fig. 2. Snippet of SQL for Portal A 
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• • • P NIST Data - 9:30:48 PM | (exec: 1s)(gui: 27s) 



Fig. 3. Portal A data displayed in the FQL IDE 
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Fig. 4. The reflexive transitive closure of a function I is Af 0 (I) U Af x (/) U ... 


Two additional inputs are specified in the original scenario [5]: an OWL on¬ 
tology X containing myriad facts about materials (e.g., steel is a metal), and 
an OWL ontology relating the vocabulary used by X (e.g., “ferrous”) to the 
vocabulary used by portal A (e.g., “iron”). At present we do not have a good 
understanding about how OWL relates to FQL. So, we went through these on¬ 
tologies by hand and stripped out relevant data. The result was 

- a (total) function parent : O -A O, where set O is the set of words from the 
ontology, and 

- a synonyms relation syn C O x N where N is the set of words from Portal A. 
We do not require that syn be an equivalence relation, and for our particular 
data, it is not. 

We encode the parent function as an instance on the S schema in Figure [4j and 
because it turns out that N and O are disjoint, we can encode the syn relation 
as an instance on the T in Figure [4] by treating the target node of both edges 
as representing N U O. If N and O were not disjoint, we would need to use a 
span [T schema with three, rather than two, nodes to encode the syn relation, 
but our development would be mostly the same. 
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2.2 Step 2: Process imported data 

We enrich Portal A’s data using the 17, A, II data migrations (defined in Sec¬ 
tion 1) as follows. We begin by computing the reflexive, transitive closure of 
the parenthood function, resulting in an isa relation. To do this, we define, in 
Figure [4j for each natural number n, a functor F n : T —>> S from the schema for 
a relation (T) to the schema for a function (S). Given an S'—instance (e.g., the 
parent function) /, Ap n (I) computes, as a T—instance (i.e., relation), the n-ary 
composition of /, i.e., J n , with the 0-th composition being the reflexive closure 
of I. The reflexive transitive closure of I is then the union Ap 0 {I) U A Fl (I)U.... 
For this example, we used n = 3. Taking the union of two instances on the same 
schema is a built-in FQL primitive^] A portion of the resulting isa relation, as 
displayed in the FQL IDE, is shown in Figure [6j 


# # # P NIST IsA - 9:54:45 PM (exec: 13s)(gui: 14s) 


Select: 


schema isa_preschema 
instance isa_pre 
schema isa_schema 
instance isa 

schema isa_schema_small 
mapping M 



mapping LO 
mapping LI 
mapping L2 
mapping L3 
instance isa_rtO 
instance isajrtl 
instance isa_rt2 
instance isa_rt3 
instance isa_rt 
instance isa_rt_small 


Full Tables 


A (27) rows 

A 1 (adorn) 

r(adorn) 

... Castlron 

Ferrous 

... Ceramic 

Material 

... Composite 

Material 

... Crystalline 

Ceramic 

... DieSteel 

Ferrous 

... Elastomer 

Polymer 

... Ferrous 

Metal 

... Glass 

NonCrystalline 

... GlassCeramics 

NonCrystalline 

... Hastelloy 

SuperAlloy 

... Hermoset 

Plastic 

... leone 1 

SuperAlloy 

... Material 

Material 

... Metal 

Material 

... NonCrystalline 

Ceramic 

... NonFerrous 

Metal 

... Nylon 

Thermoplastic 

... PVC 

Thermoplastic 

... Plastic 

Polymer 

... Polyester 

Thermoplastic 

... Polymer 

Material 

... StainlessSteel 

Steel 

... Steel 

Ferrous 

... Stellite 

SuperAlloy 

... SuperAlloy 

Metal 

... Thermoplastic 

Plastic 

... ToolSteel 

Ferrous 


Fig. 5. Initial “is-a” parent function displayed in the FQL IDE 


4 Technically, FQL has two primitives, disjoint union and relationalization (which 
equates IDs that are “observationally equivalent”). For SQL data, such as in this 
example, disjoint union followed by relationalization implements union. 
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• # • 


P NIST IsA - 9:54:45 PM | {exec: 13s)(gui: 14s) 


Select: 

schema isa_preschema 
instance isa_pre 
schema isa_schema 
instance isa 

schema isa_schema_small 
mapping M 
instance isa_small 
mapping LO 
mapping LI 
mapping L2 
mapping L3 
instance isa_rtO 
instance isa_rtl 
instance isa_rt2 
instance isa_rt3 
instance isa rt 


| Text Graph 


A (90) rows 


I (adorn) 

.... Castlron 
.... Ceramic 
_... Crystalline 
.... GlassCeramics 
.... Glass 

_... NonCrystalline 
_... Composite 
_... Crystalline 
.... DieSteel 
.... Elastomer 
... Castlron 
.... DieSteel 
... Ferrous 
... StainlessSteel 
... Steel 
... ToolSteel 
... GlassCeramics 
.... Glass 
_... Hastelloy 
Hermnset 


r(adorn) 

Castlron 

Ceramic 

Ceramic 

Ceramic 

Ceramic 

Ceramic 

Composite 

Crystalline 

DieSteel 

Elastomer 

Ferrous 

Ferrous 

Ferrous 

Ferrous 

Ferrous 

Ferrous 

GlassCeramics 

Glass 

Hastelloy 

Hermnset 


Fig. 6. Transitively closed “is-a” relation displayed in the FQL IDE 


We now have a relation (T—instance) isa C O x O, where O is the set of words 
from the materials ontology, and we have a relation (T—instance) syn cOxiV, 
where N is the set of words from Portal A. We next compute a translation of isa 
to use words from Portal A by joining isa with syn resulting in a new relation 
(T-instance) isa" C N x TV; formally, we are computing op(syn); isa; syn, where 
denotes relation composition and (x,y) E op(R ) if and only if (y,x) E R. 
Finally, we must compute the reflexive transitive closure of isa” , which we will 
denote isa’. 

To specify how to compute isa' we use FQL’s “select/from/where” syntax; an 
example of this syntax is shown in Figure[9j Note that FQL’s select/from/where 
syntax is syntactic sugar: the select/from/where syntax is equivalent to a data 
migration of the form E o 77 o A 

Now that we have the isa’ relation (T—instance) on Portal A’s vocabulary, we 
enrich Portal A’s data by joining it and the isa’ relation together. Conceptually, 
the enrichment process is similar to the process where isa C O x O was enriched 
by syn C O x TV, resulting in isa' C TV x TV; however, because Portal A’s schema 
is not a simple relation schema, it is impractical to write the FQL code for 
the enrichment by hand, even using FQL’s select/from/where syntax. Hence, 
we developed an FQL extension to generate the required FQL code from the 
definition of Portal A’s schema. The result of enrichment is a new, larger instance 
on Portal A’s schema. 
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2.3 Step 3: Query processed data 


Having enriched Portal A’s data, we can query it, using query 1 from [5]. The 
query we are using is written in FQL’s select/from/where syntax and is shown 
in Figure [9j Before enrichment, this query returns only two rows (Figure [7]). 
After enrichment, this query returns many more rows (Figure [8]), because the 
isa ’ relation contains many kinds of pre-hardened stainless steel. (Note that “Pre¬ 
hardened stainless steel” does not appear in Figure [5] because that term is used 
by Portal A but not by the 3rd party OWL materials ontology). 


P Flower 2- 10:19:25 PM 


mapping LI 
mapping L2 
mapping 13 
instance Isa_rt0 


schema syn_schema 
instance synjnst 
schema combined 
instance instl 
Instance inst2 
instance inst3 
schema isa_schema_sr 
schema isa_schema_ne 
mapping M_new 
instance isa.syn 
instance almost2 


schema S 




Fig. 7. Query result on initial data displayed in the FQL IDE 



Fig. 8. Query result on enriched data displayed in the FQL IDE 















































3 Conclusion 


By implementing this example we have shown that FQL can express “semantic 
enrichments” similar to those described in [5]. However, this particular example 
is innately relational: all of the data migrations described in this paper can 
be implemented in SQL, albeit more verbosely than in FQL. One promising 
direction for future work is to implement in FQL an enrichment scenario from J5] 
whose semantics cannot be expressed in SQL, although this will require both an 
understanding of the relationship between OWL and FQL and a formalization of 
what exactly the “black-box OWL reasoned’ employed in [5] is doing. The FQL 
IDE includes several information integration examples that cannot be expressed 
in SQL, but they are smaller than the example described in this paper and they 
are not “enrichments” in the sense of [5]. 

We also learned a valuable lesson in functorial query language design and im¬ 
plementation by developing this example. Not only does FQL’s select/from/where 
query syntax save time and effort compared to writing U o 77 o A migrations, 
in many cases we were able to write select/from/where queries when we had 
no idea how to write the corresponding U o 77 o A migration. Moreover, FQL’s 
select/from/where queries can be executed directly in a more efficient manner 
than by translation to a migration of the form U o 77 o A The reason is that 
many techniques from relational database theory, such as join re-ordering, can 
be applied directly to select/from/where syntax. Hence we conclude that se¬ 
lect/from/where syntax should be primitive in any functorial query language. 
The mathematical foundations of select/from/where queries are described in [8]. 

Disclaimer. Mention of commercial products or services in this paper does 
not imply approval or endorsement by NIST, nor does it imply that such products 
or services are necessarily the best available for the purpose. 
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select 

m.material_Material_Name as mn, 
c.capability_Capability_Name as ccn, 
c.capability_Max_Length as ml, 
uc.unitcode_Code as ucc, 

pose.productorservicecategory_Category_Name as pen 
from 

productorservicecategory as pose, 
material as m, 
unitcode as uc, 
capability as c, 
capabilitymaterials as cmX, 
capabilitycategories as cc 
where 

c = cmX.capabilitymaterials_Capability_id and 
uc = c.capability_Max_Length_Unit and 
uc.unitcode_Code="cm" and 

m = cmX.capabilitymaterials_Material_id and 
c = cc.capabilitycategories_Capability_id and 

pose = cc.capabilitycategories_ProductOrServiceCategory_id and 
(m.material_Material_Name="Pre-hardened Stainless Steel" or 
m.material_Material_Name="17-4 Stainless Steel") and 
(pose.productorservicecategory_Category_Name="Sinker EDM" or 
pose.productorservicecategory_Category_Name="Ram EDM") 


Fig. 9. FQL syntax for Query 1 [5], translated from SQL 


